Search CORE

220 research outputs found

Efficient Construction of Probabilistic Tree Embeddings

Author: Blelloch Guy E.
Gu Yan
Sun Yihan
Publication venue
Publication date: 01/01/2017
Field of study

In this paper we describe an algorithm that embeds a graph metric

(V,d_G)

on an undirected weighted graph

G=(V,E)

into a distribution of tree metrics

(T,D_T)

such that for every pair

u,v\in V

d_G(u,v)\leq d_T(u,v)

and

{\bf{E}}_{T}[d_T(u,v)]\leq O(\log n)\cdot d_G(u,v)

. Such embeddings have proved highly useful in designing fast approximation algorithms, as many hard problems on graphs are easy to solve on tree instances. For a graph with

n

vertices and

m

edges, our algorithm runs in

O(m\log n)

time with high probability, which improves the previous upper bound of

O(m\log^3 n)

shown by Mendel et al.\,in 2009. The key component of our algorithm is a new approximate single-source shortest-path algorithm, which implements the priority queue with a new data structure, the "bucket-tree structure". The algorithm has three properties: it only requires linear time in the number of edges in the input graph; the computed distances have a distance preserving property; and when computing the shortest-paths to the

k

-nearest vertices from the source, it only requires to visit these vertices and their edge lists. These properties are essential to guarantee the correctness and the stated time bound. Using this shortest-path algorithm, we show how to generate an intermediate structure, the approximate dominance sequences of the input graph, in

O(m \log n)

time, and further propose a simple yet efficient algorithm to converted this sequence to a tree embedding in

O(n\log n)

time, both with high probability. Combining the three subroutines gives the stated time bound of the algorithm. Then we show that this efficient construction can facilitate some applications. We proved that FRT trees (the generated tree embedding) are Ramsey partitions with asymptotically tight bound, so the construction of a series of distance oracles can be accelerated

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

The Geometry of Tree-Based Sorting

Author: Blelloch Guy E.
Dobson Magdalen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

We study the connections between sorting and the binary search tree (BST) model, with an aim towards showing that the fields are connected more deeply than is currently appreciated. While any BST can be used to sort by inserting the keys one-by-one, this is a very limited relationship and importantly says nothing about parallel sorting. We show what we believe to be the first formal relationship between the BST model and sorting. Namely, we show that a large class of sorting algorithms, which includes mergesort, quicksort, insertion sort, and almost every instance-optimal sorting algorithm, are equivalent in cost to offline BST algorithms. Our main theoretical tool is the geometric interpretation of the BST model introduced by Demaine et al. [Demaine et al., 2009], which finds an equivalence between searches on a BST and point sets in the plane satisfying a certain property. To give an example of the utility of our approach, we introduce the log-interleave bound, a measure of the information-theoretic complexity of a permutation ?, which is within a lg lg n multiplicative factor of a known lower bound in the BST model; we also devise a parallel sorting algorithm with polylogarithmic span that sorts a permutation ? using comparisons proportional to its log-interleave bound. Our aforementioned result on sorting and offline BST algorithms can be used to show existence of an offline BST algorithm whose cost is within a constant factor of the log-interleave bound of any permutation ?

Dagstuhl Research Online Publication Server

Brief Announcement: Concurrent Fixed-Size Allocation and Free in Constant Time

Author: Blelloch Guy E.
Wei Yuanhao
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

Dagstuhl Research Online Publication Server

LL/SC and Atomic Copy: Constant Time, Space Efficient Implementations Using Only Pointer-Width CAS

Author: Blelloch Guy E.
Wei Yuanhao
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th International Symposium on Distributed Computing (DISC 2020)
Publication date: 01/01/2020
Field of study

When designing concurrent algorithms, Load-Link/Store-Conditional (LL/SC) is often the ideal primitive to have because unlike Compare and Swap (CAS), LL/SC is immune to the ABA problem. However, the full semantics of LL/SC are not supported by any modern machine, so there has been a significant amount of work on simulations of LL/SC using Compare and Swap (CAS), a synchronization primitive that enjoys widespread hardware support. All of the algorithms so far that are constant time either use unbounded sequence numbers (and thus base objects of unbounded size), or require

\Omega(MP)

space for

M

LL/SC object (where

P

is the number of processes). We present a constant time implementation of

M

LL/SC objects using

\Theta(M+kP^2)

space, where

k

is the maximum number of overlapping LL/SC operations per process (usually a constant), and requiring only pointer-sized CAS objects. Our implementation can also be used to implement

L

-word

LL/SC

objects in

\Theta(L)

time (for both

LL

and

SC

) and

\Theta((M+kP^2)L)

space. To achieve these bounds, we begin by implementing a new primitive called Single-Writer Copy which takes a pointer to a word sized memory location and atomically copies its contents into another object. The restriction is that only one process is allowed to write/copy into the destination object at a time. We believe this primitive will be very useful in designing other concurrent algorithms as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Optimal (Randomized) Parallel Algorithms in the Binary-Forking Model

Author: Acar U. A.
Acar Umut A.
Agrawal Kunal
Agrawal Kunal
Akhremtsev Yaroslav
Arora N. S.
Ben-David Naama
Ben-David Naama
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E
Blelloch Guy E.
Blelloch Guy E.
Blelloch Guy E.
Blumofe Robert D.
Cole Richard
Cole Richard
Cole Richard
Dhulipala Laxman
Dhulipala Laxman
Gil J.
Goodrich Michael T.
Gustedt Jens
Guy
Guy
Miller G.L.
Nievergelt Jürg
Rajasekaran S.
Valiant L. G.
Vishkin Uzi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/06/2020
Field of study

In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an

\Omega(\log n)

overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized

arXiv.org e-Print Archive

Crossref

Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors

Author: Guy E. Blelloch
Marco Zagha
Phillip B. Gibbons
Yossi Matias
Publication venue
Publication date: 01/01/1995
Field of study

This paper considers issues of memory performance in shared memory multiprocessors that provide a high-bandwidth network and in which the memory banks are slower than the processors. We are concerned with the effects of memory bank contention, memory bank delay, and the bank expansion factor (the ratio of number of banks to number of processors) on performance, particularly for irregular memory access patterns. This work was motivated by observed discrepancies between predicted and actual performance in a number of irregular algorithms implemented for the cray C90 when the memory contention at a particular location is high. We develop a formal framework for studying memory bank contention and delay, and show several results, both experimental and theoretical. We first show experimentally that our framework is a good predictor of performance on the cray C90 and J90, providing a good accounting of bank contention and delay. Second, we show that it often improves performance to have addi..

CiteSeerX

Crossref